Skip to content

release: issue #23 tcpretrans fix + low-latency perf (PR #21)#24

Merged
dmitriimaksimovdevelop merged 7 commits into
masterfrom
release-test/issue-23-plus-perf
Apr 15, 2026
Merged

release: issue #23 tcpretrans fix + low-latency perf (PR #21)#24
dmitriimaksimovdevelop merged 7 commits into
masterfrom
release-test/issue-23-plus-perf

Conversation

@dmitriimaksimovdevelop
Copy link
Copy Markdown
Owner

Summary

Combined release branch that pulls two in-flight workstreams into master:

Both branches were merged via non-fast-forward merges to preserve their individual
histories; there were no textual conflicts.

Verification

Validated end-to-end on a live Ubuntu 24.04.3 / kernel 6.8.0-90 host (the exact
environment reported in #23):

  • Old behaviour: tcpretrans collector ran at Tier 2 (BCC) and dumped ~3 KB of
    `Failed to compile BPF module` stderr into melisai's log, losing the data.
  • New behaviour: tcpretrans runs at Tier 3 (native eBPF via embedded `.o`),
    no compile dumps, clean collection.
  • Residual BCC-affected tools (opensnoop / biolatency / oomkill) still fail on
    Ubuntu 24.04 BCC 0.29, but the new diagnostic path emits a single actionable
    log line each instead of the full compiler output. Native-eBPF migration for
    those tools is a follow-up tracked in `context/NATIVE_EBPF_MIGRATION.md`.
  • Build matrix (`go build`, `go vet`, `go test ./...`) passes on
    linux/amd64, linux/arm64, darwin; `make generate` produces a working ELF
    BPF object under clang 18 + bpftool v7.4 + kernel 6.8 BTF.
  • Docker test (privileged, `--pid=host --net=host`) also yields Tier 3 tcpretrans.

Test plan

  • `go build ./...` on darwin / linux-amd64 / linux-arm64
  • `go test -count=1 ./...` green across all packages
  • `make generate` produces `internal/ebpf/bpf/tcpretrans.o`
  • `./melisai collect --profile quick` on Ubuntu 24.04 / 6.8.0-90 host — tcpretrans Tier 3, clean log
  • Same scenario in a privileged Docker container
  • LVH multi-kernel matrix (deferred — SSH into `kind:*` qcow2 images via SLIRP port-forward times out; needs image/config work, captured as follow-up)

Closes #23.
Supersedes #21 (its content is merged here).

dmitriimaksimovdevelop and others added 7 commits April 7, 2026 10:32
Apply HFT-inspired low-latency best practices to reduce observer effect
and improve melisai's own performance during system profiling.

Hot path optimizations:
- Pre-compile regex patterns at package level (parsers.go)
- Pre-allocate slices with capacity hints in all parsers
- Pre-lowercase headers once instead of per-event in ParseTabularEvents
- Replace fmt.Sprintf("%v") with type-switch formatKey() in aggregation
- Make DefaultThresholds a package-level singleton (avoid 37 closure allocs)

Memory/IO optimizations:
- Add sync.Pool for bytes.Buffer reuse across 67+ BCC tool executions
- Switch diff.LoadReport to streaming json.NewDecoder (halve peak memory)
- Single-pass category scan in AI prompt generation (3 loops → 1)
- Manual binary.LittleEndian parsing in eBPF tcpretrans (avoid reflection)

Observability:
- Add --pprof flag for CPU profiling of melisai itself
- Add 11 benchmark tests for parsers, aggregation, and anomaly detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- NATIVE_EBPF_MIGRATION.md — full plan with phases, patterns, validation
- PROMPT_NATIVE_EBPF.md — reusable prompt template for AI-assisted porting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… logs (#23)

On Ubuntu 24.04 + kernel 6.8+, the BCC-bundled libbpf headers in bpfcc-tools
0.29 lack the BPF_CGROUP_UNIX_* enums that newer kernel headers reference,
causing tcpretrans-bpfcc to fail compilation with a multi-kilobyte C error
dump. melisai was silently losing tcpretrans data and polluting user logs
with the raw compiler output.

Root cause: the native-eBPF Tier 3 fallback existed but relied on a
tcpretrans.o that was never shipped — not embedded in the Go binary and
not built by goreleaser — so release users always hit the broken BCC path.

Fix, three layers:
  1. internal/ebpf/embed.go — //go:embed all:bpf + LoadEmbeddedObject helper;
     loader.go and ebpf_tcpretrans.go now prefer embedded bytes with disk
     fallback for local dev builds.
  2. executor.go — detect 'Failed to compile BPF module' / Python tracebacks
     in stderr and emit a single actionable line instead of a 3KB escape-
     dumped compiler log. Default case truncates stderr to 200 chars.
  3. Makefile + CI — `make generate` now produces vmlinux.h via bpftool btf
     dump and validates toolchain presence; release.yml and ci.yml install
     clang/llvm/linux-tools-generic/libbpf-dev and run make generate before
     go build so release binaries always carry the embedded BPF object.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Building tcpretrans.o on Ubuntu 24.04 with libbpf-dev 1.3 and a BTF-derived
vmlinux.h failed with:
  - AF_INET / AF_INET6 undeclared (they are userspace socket constants
    from <sys/socket.h>, never embedded in vmlinux.h)
  - bpf_ntohs implicit-function-declaration (pulls in <bpf/bpf_endian.h>)

Add the missing include and local #define fallbacks so `make generate`
actually produces a working object and //go:embed can ship it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dmitriimaksimovdevelop dmitriimaksimovdevelop merged commit 19aa128 into master Apr 15, 2026
2 checks passed
@dmitriimaksimovdevelop dmitriimaksimovdevelop deleted the release-test/issue-23-plus-perf branch April 15, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ошибка при запуске команды sudo ./melisai collect --profile quick -o report.json

1 participant